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c/3 ■ Abstract 

P_i, Asymmetric statistical errors arise for experimental results obtained by Maximum 

Likelihood estimation, in cases where the number of results is finite and the log likelihood 

i^ • function is not a symmetric parabola. This note discusses how separate asymmetric errors 

'^ ■ on a single result should be combined, and how several results with asymmetric errors 

^ should be combined to give an overall measurement. In the process it considers several 

methods for parametrising curves that are approximately parabolic. 



1. Introduction 

When an experimental result is presented as x^'^_ this signifies, just as with the usual 
form X ± a, that x is the value given by a 'best' estimate (i.e. one with good properties 
of consistency, efficiency, and lack of bias) and that the 68% central confidence region is 
[x — a~ J X + a~^]. 

Such asymmetric errors arise through two common causes. The first is when a nuisance 
parameter a has a conventional symmetric (even Gaussian) probability distribution, but 
produces a non-linear effect on the desired result x. These errors are generally systematic 
rather than statistical, and their probability distribution is generally best considered from 
a Bayesian viewpoint. Their treatment has been considered in a previous note [1]. 

The second cause of asymmetry is the extraction of a result x through the maximisa- 
tion of a likelihood function L{x) which is not a symmetric parabola. This occurs because 
the function is in general only parabolic in the limit when the number of results A^, the 
number of terms contributing to the sum which makes up the log likelihood, is large, and 
for many results this is not the case. For such a function the errors are conventionally read 
off the points at which the log likelihood falls by ^ from its peak, though this is not exact 
[2] and it may be better to obtain the errors from a toy Monte Carlo computation. 

Although such asymmmetric errors are frequently used in the reporting of particle 
physics results, constructive analyses of their use are scarce in the literature [3] . 

2. Two Combination Problems 

The two most significant questions on the manipulation of asymmetric errors are the 
Combination of Results and the Combination of Errors. 

2.1 Combination of Results 

The first occurs when one has two results xi L and X2 i of the same quantity. 

This arises when two different experiments measure the same quantity. Assuming that 
they are compatible (according to some criterion), one wants the ppropriate value (and 
errors) that combines the two. This is the equivalent of the well-known expression for 
symmetric errors 

xi/aj + X2/cri I 1 

1/aj + l/al ^p/af + l/al ^'> 

If the log likelihood functions Li{xi) and L2{x2) are known, then the combined log likeli- 
hood is just the sum of the two. The maximum can then be found and the errors read off 
the AlnL = — ^ points 

The question naturally extends to more than two results, and it is clearly a desir- 
ably property that the operation be associative: if results are combined pairwise till only 
one remains, then the pairing strategy should not effect the result. For the addition of 
likelihoods this obviously holds. 

2.2 Combination of Errors 

The second question arises when a particular result (taken, without loss of generality, 
as zero) is subject to several separate (asymmetric) uncertainties, and one needs to quote 
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the overall uncertainty. An obvious example would be the uncertainty due to background 
subtraction where the background has several different components, each with asymmetric 
uncertainties. This is the equivalent of the well-known expression for symmetric errors 

If a; = Xi + X2 then a'^ = af + af (2) 

Again, it is desirable that the operation be associative. 

If the likelihood functions are known then the joint function Li{xi)L2{x2) is defined 
on the (xi, X2) plane with its peak at (0,0). The uncertainty on the sum xi + X2 is found 
by the profiling technique: we find L{xi + X2), the peak value of the likelihood anywhere 
on the line xi + 0:2 = constant^ and the AlogL = — | errors can be read off from this [4]. 

To explain why this works (and when it doesn't) , consider first a case where the answer 
is easily found: suppose xi and X2 are both Gaussian, with the same mean a. The log 
likelihood can then be rewritten using u = xi + X2 and v = xi — X2'- 

x\ x\ _ (xi-|-X2)^ (a^i - 3^2)^ _ y? v'^ 



2(72 2(t2 4(t2 4a2 4ct2 4ct2 

The likelihood is the product of two Gaussians (of width \pla\ one in the combination of 
interest w, the other in the ignorable combination v. 

Now for some fixed value of v^ the likelihood for w is a Gaussian of mean zero, and the 
68% central confidence region for u is given by its standard deviation and is of half- width 
^J2a. If V is fixed at some other value, the likelihood for -u, and the deductions that can 
be drawn from it, are the same. Thus one can say 'There is a 68% probability that u lies 
in the region \—\pla^ V2cr], whatever value of v is chosen', and this can legitimately be 
shortened by striking out the final condition. And the problem is solved. 

To apply this technique in some less transparent case we need to factorise the likelihood 
into the form L\{x\)L2{x2) = Lu{u)Ly{v) where we have freedom to choose the functions 
Lu, Ly, and the form v{xi,X2)- In some instances this is clearly possible: a double Gaussian 
with o"! 7^ a2 can be factorised using v = 02X\ — o\X2- There are also instances, such as 
a volcano-crater shaped function, which are manifestly impossible to factorise. These 
can readily be proposed as counterexamples, but appear somewhat contrived and it is 
reasonable to hope that they might not occur in practical experience, except for very small 
N. 

On the grounds that if this factorisation is impossible we can get nowhere, let us 
assume it to be true and see where that leads us. Finding the explicit forms of v and L„ 
is complicated and one would like to avoid it. This can be done by noting that: 
1: For fixed v the shape of the total likelihood as a function of u is the same 
2: For fixed u the shape of the total likelihood as a function of v is the same 
(1) tells us that we can study the properties of Lu{u) by fixing on any value of v. (2) 
tells us that we can fix the value of v by finding the maximum, the likelihood (as a function 
of V, with u fixed) will always peak at the same value of v. Thus for a given u = xi + X2 
one finds the value of xi — X2 at which L is greatest, as that is always the same value of v. 
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Figure 1: 2-D likelihood functions with lines of constant u and constant v 

Figure 1 gives an illustration. The left hand plot shows the standard double Gaussian 
(shown as a linear function rather than the logarithm, for presentational reasons) as a 
function of xi and X2- The lines of constant u = xi + X2 run diagonally, from top left 
to bottom right, and the lines of constant t; = Xi — X2 are orthogonal to them, running 
from bottom left to top right. For any chosen value of v, the profile of the likelihood as 
a function of u is the same Gaussian shape, from which 68% limits can be read off, the 
same in each case. There is a line of constant v = running through the maximum, which 
follows the maximum for any chosen u. 

The right hand plot shows a more interesting function. The lines of constant u = 
Xi + X2 are as before. The lines of constant v are such that the likelihood as a function of 
u along them is the same, up to a constant factor. There is a line of constant v through 
the maximum which follows the maximum for any chosen u. 

This construction shows the limits of the technique. For some given u we plot L as a 
function of xi — X2 and compare it with the same curve for -u = 0. Then we map the values 
of xi — X2 onto the corresponding values at w = at which the log likelihood falls off from 
the peak by the same amount, and these give the lines of constant v. If both curves are 
single peaks then this is readily done and the mapping is continuous. If there are multiple 
peaks then this continuous mapping is not possible. Thus for a simple peak the technique 
will work, but not if there are secondary peaks or valleys. 

This generalises readily to the case of several variables. The profile likelihood is a 
function L{u) where u = ^xi and L is the maximum value of the likelihood in the 
u = constant hyperplane. 



3. Parametrisation of the likelihood function 

Thus both questions can be answered if the likelihood functions are known. In general 
they are not: a quoted result will only give the value and the positive and negative error. 
We therefore need a way to reconstruct, as best we can, the log likelihood function from 
them, using a parametrised curve. 

This curve must go through the three points, having a maximum at the middle one. 
This gives four equations, and hence the curve will have four parameters, obtainable from 
the quoted values of the peak and the positive and negative errors. (The fourth parameter 
is an additive constant which controls the value of the function at its maximum, which is 
in fact irrelevant for our purposes.) It must also behave in a 'reasonable' fashion elsewhere. 



Various possibilities have been tried, and tested against the log likelihood curves 
where the true value is known, such as the Poisson and the log of a Gaussian variable. For 
simplicity in what follows we take the quoted value as zero, and work with just cr_(_ and er- 
as input parameters. 

3.1 Form 1: a cubic 

Adding a cubic term is the obvious step 

f{x) = -Uax^ + (3x^) (4) 
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with the coefficients readily obtained as a = a a / — r — v = a a r — r — r- Exten- 

sion to several values has some consistency, as adding cubics will give another cubic, but 
associativity is not guaranteed. 

This gives curves which will behave sensibly in the [x — a~, x + a^] range, but outside 
that the x^ term produces an unwanted turning point and the curve does not go to — oo 
for large positive and negative x. 

3.2 Form 2: A constrained quartic 

A quartic curve can be constrained to give only one maximum by making the second 
derivative a perfect square: 

f"{x) = -Ua + (3xf fix) = -i (^ + ^ + ^^ (5) 
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The parameters are given by 



P ' 



a^a. 



6(a_ + (T+)2 ± 12^/4a+a^_ + 4a-o\ - lot - 2(7^ 



\ 3(7^ +2(T_(T++3(J 



(6) 



Here the negative sign in the expression for /3 should be chosen to give a quartic term which 
is small. In very asymmetric cases (cr_ and a^ differing by more than about a factor of 2) 
the inner square root is negative, indicating that there is no solution of the desired form. 
Then one solves for a 



^ ^ 3 6(7 ^ ' 

for both o = o^ and a = o"_, where the (— ) minus sign is used for the a_ case, and selects 
the solution which is common to both. 

Combination again gives closure, in that the sum of two quartics (with second deriva- 
tive everywhere negative) is a quartic (with second derivative everywhere negative.) 

This form gives rather better large x behaviour but is not always satisfactory in the 
range between a_ and 0"+. 



3.3 Form 3: Logarithmic 

One can also use a logarithimc approximation 



where 

CT-j-O"- 

This is easy to write down and work with, and has some motivation, as it describes the 
expansion/contraction of the abscissa variable at a constant rate. Its unpleasant features 
are that it is undefined for values of x beyond some point in the direction of the smaller 
error, as 1 + 7a; goes negative, and that it does not give a parabola in the a+ — a- limit. 

3.4 Form 4: Generalised Poisson 

Starting from the Posson likelihood L{x) = —x + Nlnx — InNl one can generalise to 

f{x) = —a{x + (3) + uln a{x + (3) + const (10) 

using z/, a continuous variable, to give skew to the function, and then scaling and shifting 
using a and (3. Putting the maximum at the right place requires u = al3 and thus, adjusting 
the constant for convenience to make the peak value zero: 

dx 
/(x) = -ax + z/ln(H ) (10a) 

V 

Writing 7 = ajv the equations at a_ and a_|_ lead to 

ea:p-'^("-+"+) (11) 



1-7^- „_-^(._+<,+ ) 



1 +7Cr+ 

This has to be solved numerically. It has a solution between 7 = and 7 = l/a"_ which 
can be found by bifurcation. (Attempts to use more sophisticated algorithms failed.) 
Given the value of 7, v is then found from 
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2(7a+-ln(l+7a+)) 



This form did fairly well with many of the tests, but the extraction of the function 
parameters from a_ and (j+ is inelegantly numerical. 



3.5 Form 5: Variable Gaussian (1) 

Another function is motivated by the Bartlett technique for maximum likehhood errors 
[2,5]. This assumes (and indeed justifies) that the hkehhood function for a result x from a 
true value x is described with good accuracy by a Gaussian whose width depends on the 
value of X. 
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a{x) 



lnL{x;x) = -l(^-^) (13) 



This does not include the —lna{x) term from the denominator of the Gaussian. However 
it turns out [2] that omitting this term actually improves the accuracy of the A InL = — ^ 
errors, bringing them into line with the Bartlett form. 

We make the further assumption that in the neighbourhood of interest this variation 
in standard devation is linear 

a{x) = a + a'{x - x) (14) 



lnL(x;x) = -^ ( — ) (15) 

the requirement that this go through the — ^ points gives 

2o"-(-0"— , o"_|_ — cr_ 



a = a 



a+ + a- 0-+ + CT- 



(16) 



Thus the parameters are easy to find, and when a- = cr_|_ the symmetric case is smoothly 
incorporated. 

3.6 Form 6: Variable Gaussian (2) 

Still using the Bartlett-inspired form, we could alternatively take the variance as linear 

V{x) = V + V'{x-x) (17) 

and 



{x — x)^ 
V + V'{x-x) 



lnL{x-x) = -\ ^^ ^^„,! ^, (18) 



and the parameters are again easy to find, and sensible if a_ — a^ 

V = a-a+ V' = a+-a- (19). 



3. 7 Example: Approximating a Poisson likelihood 
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Figure 2: Approximations to a Poisson likelihood 
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Figure 2 shows in black the likelihood function for Poisson measurement of 5 events. 
In red are the approximations, constrained to peak at a; = 5 and to go through the — | 
points, indicated by the horizontal line. They all do well interpolationg in that region, 
but outside it their behavour is very different. The polynomial forms diverge significantly 
from the truth. The logarithmic form does fairly well, and the generalised Poisson does 
perfectly (as it should for a Poisson likelihood). The variable width Gaussian models both 
do quite well, but the one with linear variance does noticably better than the form linear 
in the standard deviation 



3.8 Example: Approximating a Logarithmic measurement. 
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Figure 3: Approximations to the likelihood of the log of a Gaussian measuremnet 



Figure 3 shows the same approximations, fitting a measurement of a; = Iny, where y 
is a Gaussian measurement with the value 8 ± 3. 

Again, all perform well in the central region, and the polynomial forms diverge badly 
outside that region, though the quartic does adequately on the positive side and down to 
about — 2o"_ from the peak. The logarithmic curve does fairly well, but the generalised 
Poisson is not so good. The variable width Gaussians both do well, but in this case the 
linear a form does markedly better than the linear variance form. 



We can conclude that the variable width Gaussians are the best approximation for our 
purpose, having good descriptive power together with parameters that are readily obtained 
from Equations 16 or 19, but that the choice between the linear a or linear V form is one 
that the user has to make on a case by case basis. Likelihood functions based on a Poisson 
measurement will be better represented by the linear V form. 

4. Procedure for combination of results 

Working with a variable- width Gaussian parametrisation the likelihood function for a 
set of measurements Xi is 

For the linear a form, the position of the maximum is given by the equation 



X 



y^Wi = y^XiWi with Wi = -^ -3. (21) 

i i {a, + a^[x-Xi)) 



For the linear V form the corresponding equation is 



X 



j:«;. = j:«;.(x.-^(£-x,n with n,, = jy-^yT^^^^,- (22) 



The algebra is simple, and has been implemented in a Java applet, obtainable under 



http : //www . slac . Stanford . edu/^barlow/statistics . html . 



Equations 21 and 22 are nonlinear for x, and the solution is found by iteration: 
jq ^i "^^ ^^ taken as a first guess for x, and this is used in the right hand side of the 
equation to give an improved value. The implementation deems it to have converged if the 
step size is less that 10~^ of the total range of interest, defined as from — 3a"_ below the 
lowest point to +3a+ above the highest. In practice such convergence occurs after a few 
iterations. 

The AlogL = — ^ points of the function of Equation 20 are also found numerically. 
The function is reasonably linear over the region where the iteration is performed, and 
again convergence is rapid: an initial value is taken, inspired by Equation (1), as the 
inverse root sum of the inverse squares of the positive or negative, as appropriate, errors. 
A small step is taken, until the — ^ line is crossed, and successive linear interpolation is 
then done until the value is within 10~^ of 0.5. Again, only a few iterations are required 
for a typical case. 

The value of the function at the peak gives the x^ for the result, and this can be used 
to judge the compatibility of the different results. (The number of degrees of freedom is 
just one less that the number of values being combined.) 
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Figure 4: Three parametrised likelihood curves and their sum 

Figure 4 shows the graphical result of combining 1.9^0 5 with 2.4^'^Qg and 3.1]to'4- The 
upper black line shows the peak value (which, as mentioned earlier, is not relevant and 
therefore set to zero). The lower black line shows InL = — | The 3 blue curves are the 
three parametrised likelihood curves (using linear a). It can be seen that they do indeed 
each go through their 3 known values correctly. Otherwise we have no precise knowledge 
of what they should look like, but they are apparently well behaved. 



The red curve is the sum of the three blue curves (again, adjusted to have a peak value 
of zero.) The position of the peak, found as described above, is indicated by the short 
vertical red line, and the horizontal red line indicates the 68% confidence interval, again 
obtained as described above. One can thus verify by eye that the numerical techniques are 
giving sensible answers. 



Results are also given numerically, as shown in Figure 5. Values and errors are given, 
and each measurement may be specified as being linear in a or V using the right hand 
button. On pressing the bottom left button, the graph above is drawn and the numerical 
values displayed. There are also facilities to add more values (up to a limit of 10). 
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Figure 5: The user interface, showing input values, output values and options 

4.1 Example of combination of results 

Suppose a counting experiment sees 5 events. The result is quoted (using the A In L = 
— I errors, even though this is a case where the full Neyman errors could be given) as 
^ii 916- Suppose further that it is repeated and the same result is obtained. With the 
knowledge of the details we can obtain the combined result just by halving the total 
measurement of 10l2'g38 ^° S^^^ ^^ exact answer of 5]']}'4^g. But in general we would not 
know this and just be given the measurements, and combine them using the above method. 
This (using the linear variance model) gives a combined result of Sl^^^g. So the combined 
result is exact, with discrepancies only in the fourth decimal place of the errors. 

Table 1 shows these, together with the values obtained from other pairs of results with 
the same sum. 



Xi. 



.X2_ 



Linear a 



Linear V 



r+2.581 C-+2.581 

^-1.916 "J-Lgie 

^+2.794 .+2.346 
"-2. 128 ^-1.682 
7+2.989 q+2.080 
'-2.323 "^-1.416 



^+1.737 
-1.408 

^+1.778 
-1.432 

3 + 1.936 
-1.529 



5.000: 
5.000: 

5.038: 

^•^*-'^-1.826 

7 oc-Q+3.149 p; ono + l. 

Table 1: Combining results in a case of two samples from the same Poisson distribution 



j+3.171 
'-2. 505 



) + 1.765 
-1.102 



5.000 
5.000 
5.009 
5.055 



Q+3.342 1+1.358 
^-2.676 -^-0.6983 



+ 1.747 
1.415 

+ 1.758 
1.425 

+ 1.793 
1.456 

+ 1.855 
1.515 
1.942 



This shows that the technique, especially with the linear variance model, works very 
well. There are discrepancies, but these are reasonable given the assumptions that have 
had to be made. It is worth pointing out that the larger discrepancies of the final two rows 
are produced by rather unlikely experimental circumstances - the probability of 10 events 
being split 9:1 or even 8:2 between the two experimental runs is small. (This shows up in 
their x^ values which are large enough to flag a warning.) 
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5. Procedure for Combination of Errors 

To combine errors when the likelihoods are not given in full, and only the errors are 
available, we again parameterise them by the variable Gaussian model 

1 \ ^ / '^i \ '^i 



lnL{x) = -^y ^ or— -^-^ (23) 

where the Xi represent deviations from the quoted result. Their total is w = ^^ Xi and to 
find L{u) the sum of Equation 23 is maximised, subject to the constraint ^Xi = u. The 
method of undetermined multipliers gives the solution as 

x^ = w^^^^ (24) 

ja^ + alx^r {V^ + V^x^f 

"^^^^ "- = 2a. °^ 2V^ + y/.. ^2') 

This is an non-linear set of equations. However a solution can be mapped out, starting 

at -u = for which all the Xi are zero. Increasing u in small amounts. Equation 24 is used to 

give the small the changes in the Xi, and the weights are then re-evaluated using Equation 

25. 

This has also been implemented by a Java program obtainable at the web address 
mentioned above. It has a similar user interface panel, and displays the form of L(u) used 
to read off the total A InL = — i errors. 

5. 1 An example of combination of errors 

Suppose that A^ events have been observed in an experiment, and to extract the signal 
the number of background events must be subtracted. We suppose that there are several 
such sources, determined by separate experiments, and that, for simplicity, these do not 
have to be scaled; the backgrounds were determined by running the apparatus, in the 
absence of signal, for the same period of time as the actual experiment. 

Suppose that two backgrounds are measured, one giving 4 events and the other 5. 
These are reported as 4l^gg2 and Sl^'g^g. (again using the AlnL = — ^ errors.) This 
method gives the combined error as ^2 668- However in this case where the backgrounds 
are combined with equal weight, one could just quote the the total number of background 
events as 9^2 676- "^^^ method's error values are in impressive agreement with this. Further 
examples are given in table 2 
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3 + 6 


2.653 


3.310 


2.668 3.333 


2 + 7 


2.653 


3.310 


2.668 3.333 


1 + 8 


2.654 


3.313 


2.668 3.333 


3 + 3 + 3 


2.630 


3.278 


2.659 3.323 


1+1+1+1+1+1+1+1+1 


2.500 


3.098 


2.610 3.270 



Table 2: Various combinations of Poisson errors which should give a_ = 2.676, a+ = 3.342 
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6. Conclusions 

If the full likelihood functions are not given, then there is no exact method for com- 
bination of errors and results with asymmetric statistical errors. However the procedures 
decribed here, which work by making an approximation to the likelihood function on the 
basis of the quoted value and errors, appear to be reasonably accurate and robust. They 
are also easy to implement and user. 
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